Using the Lemmatization Technique for Phonetic Transcription in Text-to-Speech System

نویسندگان

  • Jakub Kanis
  • Ludek Müller
چکیده

This paper deals with lemmatization technique and its using for the phonetic transcription of exceptional words. The lemmatizer is based on language morphology and uses the lexicon of word basic forms and inversion of the derivation rules to acquire the lemmatization rules which are essential for finding the word bases. We have described the lemmatization algorithm and necessary modifications of the lemmatizer to transcribe exceptional words. The main goal of the designed system is memory saving of the exceptional lexicon. The experimental results have shown that we can save from 18.3% (English) to 98.4% (Finnish) of size of the full lexicon. Hence, this system is suitable for high inflectional and agglutinative languages.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using the Lemmatization Technique for Phonetic Transcription in Text-to-Speech System

This paper deals with lemmatization technique and its using for the phonetic transcription of exceptional words. The lemmatizer is based on language morphology and uses the lexicon of word basic forms and inversion of the derivation rules to acquire the lemmatization rules which are essential for finding the word bases. We have described the lemmatization algorithm and necessary modifications o...

متن کامل

مراحل و نحوه ی تهیه ی دادگان های صوتی هجایی و دایفونی برای سامانه ی تبدیل متن به گفتار فارسی

Abstract Speech databases are part of the concatenative text to speech synthesis systems. Phonetic quality of the databases plays a significant role in the naturalness of the synthesized speech. This paper introduces two syllable and diphone speech databases for Persian and investigates the way of their development and their specifications and their advantages to each other. ...

متن کامل

Data Driven Approaches to Phonetic Transcription with Integration of Automatic Speech Recognition and Grapheme-to-Phoneme for Spoken Buddhist Sutra

We propose a new approach for performing phonetic transcription of text that utilizes automatic speech recognition (ASR) to help traditional grapheme-to-phoneme (G2P) techniques. This approach was applied to transcribe Chinese text into Taiwanese phonetic symbols. By augmenting the text with speech and using automatic speech recognition with a sausage searching net constructed from multiple pro...

متن کامل

Using speech recognition technique for constructing a phonetically transcribed taiwanese (min-nan) text corpus

Collection of Taiwanese text corpus with phonetic transcription suffers from the problems of multiple pronunciation variation. By augmenting the text with speech, and using automatic speech recognition with a sausage searching net constructed from the multiple pronunciations of the text corresponding to its speech utterance, we are able to reduce the effort for phonetic transcription. By using ...

متن کامل

Automatic generation of phonetic transcriptions for large speech corpora

We describe a method for the automatic production of phonetic transcriptions in large speech corpora. First, we focus on the application of different techniques for the generation of pronunciation variants. Then, we explain the application of a speech recognition system for selecting the acoustically best matching phonetic transcription. The system is evaluated on different test sets selected f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004